Hybrid Algorithm for Clustering Gene Expression Data
نویسندگان
چکیده
Microarray gene expressions provide an insight into genomic biomarkers that aid in identifying cancerous cells and normal cells. In this study, functionally related genes are identified by partitioning gene data. Clustering is an unsupervised learning technique that partition gene data into groups based on the similarity between their expression profiles. This identifies functionally related genes. In this study, a hybrid framework is designed that uses adaptive pillar clustering algorithm and genetic algorithm. A first step towards, the proposed work is the utilization of clustering technique by adaptive pillar clustering algorithm that finds cluster centroids. The centroids and its clustering elements are calculated by average mean of pairwise inner distance. The output of adaptive pillar clustering algorithm results in number of clusters which is given as input to genetic algorithm. The microarray gene expression data set considered as input is given to adaptive pillar clustering algorithm that partitions gene data into given number of clusters so that the intra-cluster similarity is maximized and inter cluster similarity is minimized. Then for each combination of clustered gene expression, the optimum cluster is found out using genetic algorithm. The genetic algorithm initializes the population with set of clusters obtained from adaptive pillar clustering algorithm. Best chromosomes with maximum fitness are selected from the selection pool to perform genetic operations like crossover and mutation. The genetic algorithm is used to search optimum clusters based on its designed fitness function. The fitness function designed minimizes the intra cluster distance and maximizes the fitness value by tailoring a parameter that includes the weightage for diseased genes. The performance of adaptive pillar algorithm was compared with existing techniques such as k-means and pillar k-means algorithm. The clusters obtained from adaptive pillar clustering algorithm achieve a maximum cluster gain of 894.84, 812.4 and 756 for leukemia, lung and thyroid gene expression data, respectively. Further, the optimal cluster obtained by hybrid framework achieves cluster accuracy of 81.3, 80.2 and 78.2 for leukemia, lung and thyroid gene expression data respectively.
منابع مشابه
Modification of the Fast Global K-means Using a Fuzzy Relation with Application in Microarray Data Analysis
Recognizing genes with distinctive expression levels can help in prevention, diagnosis and treatment of the diseases at the genomic level. In this paper, fast Global k-means (fast GKM) is developed for clustering the gene expression datasets. Fast GKM is a significant improvement of the k-means clustering method. It is an incremental clustering method which starts with one cluster. Iteratively ...
متن کاملA Hybrid Data Clustering Algorithm Using Modified Krill Herd Algorithm and K-MEANS
Data clustering is the process of partitioning a set of data objects into meaning clusters or groups. Due to the vast usage of clustering algorithms in many fields, a lot of research is still going on to find the best and efficient clustering algorithm. K-means is simple and easy to implement, but it suffers from initialization of cluster center and hence trapped in local optimum. In this paper...
متن کاملخوشهبندی دادههای بیانژنی توسط عدم تشابه جنگل تصادفی
Background: The clustering of gene expression data plays an important role in the diagnosis and treatment of cancer. These kinds of data are typically involve in a large number of variables (genes), in comparison with number of samples (patients). Many clustering methods have been built based on the dissimilarity among observations that are calculated by a distance function. As increa...
متن کاملTabu-KM: A Hybrid Clustering Algorithm Based on Tabu Search Approach
The clustering problem under the criterion of minimum sum of squares is a non-convex and non-linear program, which possesses many locally optimal values, resulting that its solution often falls into these trap and therefore cannot converge to global optima solution. In this paper, an efficient hybrid optimization algorithm is developed for solving this problem, called Tabu-KM. It gathers the ...
متن کاملGROUND MOTION CLUSTERING BY A HYBRID K-MEANS AND COLLIDING BODIES OPTIMIZATION
Stochastic nature of earthquake has raised a challenge for engineers to choose which record for their analyses. Clustering is offered as a solution for such a data mining problem to automatically distinguish between ground motion records based on similarities in the corresponding seismic attributes. The present work formulates an optimization problem to seek for the best clustering measures. In...
متن کامل